SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing
نویسندگان
چکیده
For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art data analytic methods fail to make full use available computing resources. The main reason is that such must wait for bottleneck complete corresponding transmission computation in each phase. Furthermore, may be impractical bandwidth dynamicity diverse job parallelism. To this end, we propose a Simultaneous Data Transfer Processing (SDTP) mechanism accelerate analytics, with joint consideration dynamics In SDTP, site can execute computation, provided it obtains required input data. As result, loading, map, shuffle, reduce phases at need not completion previous other sites. We further improve SDTP method offering more accurate time estimation generalizing dynamic situations. trace-driven results demonstrate response 19% 72% compared methods.
منابع مشابه
PIXIDA: Optimizing Data Parallel Jobs in Wide-Area Data Analytics
In the era of global-scale services, big data analytical queries are often required to process datasets that span multiple data centers (DCs). In this setting, cross-DC bandwidth is often the scarcest, most volatile, and/or most expensive resource. However, current widely deployed big data analytics frameworks make no attempt to minimize the traffic traversing these links. In this paper, we pre...
متن کاملLube: Mitigating Bottlenecks in Wide Area Data Analytics
Over the past decade, we have witnessed exponential growth in the density (petabyte-level) and breadth (across geo-distributed datacenters) of data distribution. It becomes increasingly challenging but imperative to minimize the response times of data analytic queries over multiple geo-distributed datacenters. However, existing scheduling-based solutions have largely been motivated by pre-estab...
متن کاملCluster-to-cluster data transfer with data compression over wide-area networks
The recent emergence of ultra high-speed networks up to 100 Gb/s has posed numerous challenges and has led to many investigations on efficient protocols to saturate 100 Gb/s links. However, end-to-end data transfers involve many components, not only protocols, affecting overall transfer performance. These components include disk I/O subsystem, additional computation associated with data streams...
متن کاملThe Wide Area Data
Sharing global remote data over large networks poses two major problems: rstly, the data must be discovered; and secondly, the data must be made accessible to the application. Our aim is to provide a single uniied interface to both local and remote data, removing location dependence and improving performance. Our solution incorporates shared memory and caching techniques. A location server prov...
متن کاملScalable Bulk Data Transfer in Wide Area Networks
Bulk data transfer in wide area networks (WAN) requires scalable and high network bandwidth. In this paper, we identify a number of the scalability limitations that affect the full utilization of peak theoretical network bandwidth. In addition, we study and classify different offered approaches to overcome some of the identified limitations and increase network bandwidth among Grid components i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Cloud Computing
سال: 2023
ISSN: ['2168-7161', '2372-0018']
DOI: https://doi.org/10.1109/tcc.2021.3119991